Search results for "Floating point"
showing 7 items of 7 documents
Nonlinear systems solver in floating-point arithmetic using LP reduction
2009
This paper presents a new solver for systems of nonlinear equations. Such systems occur in Geometric Constraint Solving, e.g., when dimensioning parts in CAD-CAM, or when computing the topology of sets defined by nonlinear inequalities. The paper does not consider the problem of decomposing the system and assembling solutions of subsystems. It focuses on the numerical resolution of well-constrained systems. Instead of computing an exponential number of coefficients in the tensorial Bernstein basis, we resort to linear programming for computing range bounds of system equations or domain reductions of system variables. Linear programming is performed on a so called Bernstein polytope: though,…
A dynamic program analysis to find floating-point accuracy problems
2012
Programs using floating-point arithmetic are prone to accuracy problems caused by rounding and catastrophic cancellation. These phenomena provoke bugs that are notoriously hard to track down: the program does not necessarily crash and the results are not necessarily obviously wrong, but often subtly inaccurate. Further use of these values can lead to catastrophic errors.In this paper, we present a dynamic program analysis that supports the programmer in finding accuracy problems. Our analysis uses binary translation to perform every floating-point computation side by side in higher precision. Furthermore, we use a lightweight slicing approach to track the evolution of errors.We evaluate our…
Efficient and portable acceleration of quantum chemical many-body methods in mixed floating point precision using OpenACC compiler directives
2016
It is demonstrated how the non-proprietary OpenACC standard of compiler directives may be used to compactly and efficiently accelerate the rate-determining steps of two of the most routinely applied many-body methods of electronic structure theory, namely the second-order M{\o}ller-Plesset (MP2) model in its resolution-of-the-identity (RI) approximated form and the (T) triples correction to the coupled cluster singles and doubles model (CCSD(T)). By means of compute directives as well as the use of optimized device math libraries, the operations involved in the energy kernels have been ported to graphics processing unit (GPU) accelerators, and the associated data transfers correspondingly o…
LARGE-SCALE SIMULATIONS IN CONDENSED MATTER PHYSICS —THE NEED FOR A TERAFLOP COMPUTER
1992
The introduction of vector processors {“supercomputers” with a performance in the range of 109 floating point operations (1 GFLOP) per second} has had an enormous impact on computational condensed matter physics. The possibility of a substantially enhanced performance by massively parallel processors (“teraflop” machines with 1012 floating point operations per second) will allow satisfactory treatment of a large range of important scientific problems which have to a great extent thus far escaped numerical resolution. The present paper describes only a few examples (out of a long list of interesting research problems!) for which the availability of “teraflops” will allow spectacular progres…
Hardware-efficient matrix inversion algorithm for complex adaptive systems
2012
This work shows an FPGA implementation for the matrix inversion algebra operation. Usually, large matrix dimension is required for real-time signal processing applications, especially in case of complex adaptive systems. A hardware efficient matrix inversion procedure is described using QR decomposition of the original matrix and modified Gram-Schmidt method. This works attempts a direct VHDL description using few predefined packages and fixed point arithmetic for better optimization. New proposals for intermediate calculations are described, leading to efficient logic occupation together with better performance and accuracy in the vector space algebra. Results show that, for a relatively s…
A Novel Systolic Parallel Hardware Architecture for the FPGA Acceleration of Feedforward Neural Networks
2019
New chips for machine learning applications appear, they are tuned for a specific topology, being efficient by using highly parallel designs at the cost of high power or large complex devices. However, the computational demands of deep neural networks require flexible and efficient hardware architectures able to fit different applications, neural network types, number of inputs, outputs, layers, and units in each layer, making the migration from software to hardware easy. This paper describes novel hardware implementing any feedforward neural network (FFNN): multilayer perceptron, autoencoder, and logistic regression. The architecture admits an arbitrary input and output number, units in la…
Accelerated fluctuation analysis by graphic cards and complex pattern formation in financial markets
2009
The compute unified device architecture is an almost conventional programming approach for managing computations on a graphics processing unit (GPU) as a data-parallel computing device. With a maximum number of 240 cores in combination with a high memory bandwidth, a recent GPU offers resources for computational physics. We apply this technology to methods of fluctuation analysis, which includes determination of the scaling behavior of a stochastic process and the equilibrium autocorrelation function. Additionally, the recently introduced pattern formation conformity (Preis T et al 2008 Europhys. Lett. 82 68005), which quantifies pattern-based complex short-time correlations of a time serie…